The Best 56 Audio Generation Tools in 2025

Musicgen Medium
MusicGen is a text-to-music model that generates high-quality music samples based on text descriptions or audio prompts, utilizing a 1.5-billion-parameter autoregressive Transformer architecture.
Audio Generation Transformers
M
facebook
1.5M
118
Encodec 24khz
EnCodec is a high-fidelity real-time neural audio codec developed by Meta AI, employing end-to-end training and supporting multiple bandwidth settings.
Audio Generation Transformers
E
facebook
534.08k
46
Encodec 32khz
High-fidelity real-time neural audio codec developed by Meta AI, specifically trained for the MusicGen project
Audio Generation Transformers
E
facebook
348.00k
19
Bigvgan V2 44khz 128band 512x
MIT
BigVGAN is a universal neural vocoder based on large-scale training, capable of generating high-quality audio waveforms.
Audio Generation
B
nvidia
223.13k
41
Musicgen Small
MusicGen is a text-to-music model that generates high-quality music samples based on text descriptions or audio prompts.
Audio Generation Transformers
M
facebook
123.91k
429
Stable Audio Open 1.0
Other
Stable Audio Open 1.0 is a text-to-audio generation model capable of generating up to 47 seconds of 44.1kHz stereo audio based on text prompts.
Audio Generation English
S
stabilityai
36.03k
1,170
Bigvgan V2 24khz 100band 256x
MIT
BigVGAN is a high-performance neural vocoder that achieves high-quality audio synthesis through large-scale training, supporting multiple sampling rates and frequency band configurations.
Audio Generation
B
nvidia
34.03k
14
Encodec 48khz
MIT
EnCodec is a real-time high-fidelity neural audio codec developed by Meta AI, supporting multiple bandwidth configurations and streaming processing.
Audio Generation Transformers
E
facebook
23.25k
32
Musicgen Songstarter V0.2
A text-to-audio model fine-tuned from musicgen-stereo-melody-large, designed for music producers to generate 32kHz stereo audio song ideas
Audio Generation English
M
nateraw
22.11k
157
Musicgen Stereo Small
AI model that generates high-quality stereo music samples based on text descriptions, supporting 300M parameter scale
Audio Generation Transformers
M
facebook
7,091
29
Musicgen Small
MusicGen Small is a Transformer-based music generation model capable of producing high-quality music clips from text descriptions.
Audio Generation Transformers
M
Xenova
5,434
24
Musicgen Large
MusicGen is a text-to-music generation model capable of producing high-quality music samples based on text descriptions or audio prompts.
Audio Generation Transformers
M
facebook
5,125
448
Musicgen Melody
MusicGen is a simple and controllable music generation model capable of producing high-quality music based on text descriptions or melody inputs.
Audio Generation Transformers
M
facebook
3,632
216
Musicgen Melody Large
MusicGen is a text-to-music generation model developed by Meta AI, capable of producing high-quality music samples based on text descriptions or audio prompts.
Audio Generation Transformers
M
facebook
1,414
29
Ace Gguf
Apache-2.0
ACE-Step-v1-3.5B is a text-to-audio model that supports high-quality audio generation, suitable for music and sound effects creation.
Audio Generation
A
calcuis
1,332
12
Stable Audio Open Small
Other
A diffusion model that generates up to 11 seconds of 44.1kHz stereo audio based on text prompts
Audio Generation English
S
stabilityai
1,171
141
Stable Codec Speech 16k
Other
High-quality low-bitrate speech codec model based on Transformer architecture, specifically designed for speech data compression and generative modeling
Audio Generation English
S
stabilityai
1,072
17
Magnet Small 10secs
MAGNeT is a text-to-music and text-to-audio model capable of generating high-quality audio samples from text descriptions.
Audio Generation
M
facebook
976
22
ACE Step V1 Chinese Rap LoRA
Apache-2.0
A hybrid rap vocal model focused on improving the generation quality of Chinese rap/hip-hop music
Audio Generation Supports Multiple Languages
A
ACE-Step
896
15
Slam Scaled
MIT
A high-quality speech language model trained on a single GPU within 24 hours, fine-tuned based on Qwen2.5-0.5B, using Hubert tokens as vocabulary
Audio Generation Transformers
S
slprl
792
6
Inspiremusic 1.5B Long
Apache-2.0
InspireMusic is a unified toolkit focused on music generation, song generation, and audio generation, supporting high-fidelity and long-form music generation.
Audio Generation Safetensors English
I
FunAudioLLM
760
28
Tangoflux
TangoFlux is an efficient text-to-audio generation system that combines flow matching and CLAP preference optimization technologies to quickly produce high-quality audio.
Audio Generation
T
declare-lab
727
94
Audio Magnet Medium
MAGNeT is a non-autoregressive Transformer-based text-to-music and sound effects generation model capable of producing high-quality audio samples from text descriptions.
Audio Generation
A
facebook
435
34
Magnet Medium 30secs
MAGNeT is a text-to-music and text-to-sound model capable of generating high-quality audio samples from text descriptions.
Audio Generation
M
facebook
409
36
Musicgen Stereo Large
MusicGen is a text-to-music generation model developed by Meta AI, supporting stereo generation and capable of producing high-quality music samples based on text descriptions or audio prompts.
Audio Generation Transformers
M
facebook
382
74
Magnet Medium 10secs
MAGNeT is a text-to-music and text-to-sound model that can generate high-quality audio samples based on text descriptions.
Audio Generation
M
facebook
322
8
Yue S2 1B General Exl2 8.0bpw
Apache-2.0
YuE is a groundbreaking open-source foundational model series specifically designed for music generation, particularly for converting lyrics into complete songs (lyrics2song).
Audio Generation
Y
Alissonerdx
310
1
Musicgen Stereo Medium
Stereo music generation model released by Meta AI, capable of generating high-quality music from text descriptions
Audio Generation Transformers
M
facebook
303
30
Magnet Small 30secs
MAGNeT is a text-to-music and text-to-sound model capable of generating high-quality audio samples from text descriptions.
Audio Generation
M
facebook
215
8
Sentis MusicGen
MIT
A Meta MusicGen model verified by Unity Sentis that can generate stylized music up to 30 seconds long based on text prompts.
Audio Generation
S
unity
174
17
Audio Magnet Small
MAGNeT is a text-to-music and text-to-sound model capable of generating high-quality audio samples based on text descriptions. It is a non-autoregressive Transformer model based on masked generation, using a 32kHz EnCodec tokenizer.
Audio Generation
A
facebook
161
9
Perceiver Ar Sam Giant Midi
Apache-2.0
A symbolic audio model based on the Perceiver AR architecture, pre-trained on the GiantMIDI-Piano dataset for symbolic audio generation
Audio Generation Transformers
P
krasserm
153
11
Tango2
Tango 2 is an improved text-to-audio generation model based on Tango, optimizing audio generation quality through DPO alignment training
Audio Generation Transformers English
T
declare-lab
147
17
Yue S1 7B Anneal Jp Kr Icl
Apache-2.0
YuE is a series of open-source foundational models specifically designed for music generation, particularly for converting lyrics into complete songs (lyrics2song).
Audio Generation Safetensors
Y
m-a-p
136
11
Tango
TANGO is an instruction-guided diffusion model for text-to-audio generation, capable of producing realistic audio including human voices, animal sounds, and natural or artificial sound effects based on text prompts.
Audio Generation Transformers English
T
declare-lab
118
41
Slam
MIT
This is a speech language model based on discrete Hubert tokens, focusing on efficient training and capable of generating speech segment continuations.
Audio Generation Transformers
S
slprl
115
10
Tunesformer
MIT
TunesFormer is a Transformer-based dual-decoder model designed to generate melodies that conform to user-defined musical forms, especially suitable for traditional Irish music.
Audio Generation Transformers
T
sander-wood
90
6
Musiclang 4k
Gpl-3.0
Generative AI-based MIDI music creation model supporting generation from scratch or template-based continuation
Audio Generation Transformers
M
musiclang
83
17
Musicgen Stereo Melody
MusicGen is a text-to-music generation model developed by Meta AI, capable of producing high-quality stereo music samples based on text descriptions or audio prompts.
Audio Generation Transformers
M
facebook
82
10
Music Large 800k
Apache-2.0
This is a large Transformer model with 780 million parameters, specifically designed for music generation and transcription tasks, using anticipatory training methods.
Audio Generation Transformers
M
stanford-crfm
73
27
Tango2 Full
Tango 2 is an improved text-to-audio generation model based on Tango, achieving alignment training for audio generation through Direct Preference Optimization (DPO) technology
Audio Generation Transformers English
T
declare-lab
63
9
Inspiremusic 1.5B 24kHz
Apache-2.0
InspireMusic is a unified framework focused on music generation, song generation, and audio generation, integrating autoregressive transformers with flow-matching models through audio tokenization technology, supporting high-quality long audio generation.
Audio Generation English
I
FunAudioLLM
62
6
AIbase
Empowering the Future, Your AI Solution Knowledge Base
© 2025AIbase